[Cosmos] Share PartitionKeyRangeCache across CosmosClients targeting the same account#49560
[Cosmos] Share PartitionKeyRangeCache across CosmosClients targeting the same account#49560xinlian12 wants to merge 27 commits into
Conversation
…ing the same account Move the partition-key-range routing-map cache from per-CosmosClient to a process-wide, refcounted registry keyed by service endpoint. Multiple CosmosClient / CosmosAsyncClient instances in the same JVM targeting the same Cosmos account now share a single AsyncCacheNonBlocking instance for collection -> CollectionRoutingMap, eliminating duplicate routing-map memory and redundant /pkranges fetches. Design - New SharedRoutingMapCacheRegistry (process-wide singleton) holds an AsyncCacheNonBlocking per endpoint URL plus an AtomicInteger refcount. All state transitions go through ConcurrentHashMap.compute, giving atomic per-key check-and-update without a global lock. - RxPartitionKeyRangeCache: new ctor accepts the service endpoint; underlying routingMapCache is obtained from the registry. Implements Closeable; close() releases this client's reference and is idempotent. - RxDocumentClientImpl: passes serviceEndpoint to the cache ctor and releases the cache reference in its close() path. - Opt-out: COSMOS.SHARED_PARTITION_KEY_RANGE_CACHE_ENABLED=false restores the pre-sharing behaviour (each client owns a private cache). Why this is safe - PK-range data is account-level metadata, not credential-bound. - AsyncCacheNonBlocking already enforces single-flight per key; sharing the instance strengthens that to "single in-flight /pkranges per (account, container) across all clients". - The two-arg back-compat ctor resolves the endpoint from the client, so existing mocked tests continue to work (mock returns null endpoint -> isolated cache, matching today's behaviour). Tests - New SharedRoutingMapCacheRegistryTest: acquire/release sharing, refcount eviction, idempotent release, null-endpoint isolation, opt-out flag, 32-thread concurrent acquire/release stress. - New RxPartitionKeyRangeCacheTest cases: two caches at same endpoint share storage (verified by mock /pkranges call count = 1, not 2), caches at different endpoints stay independent, close() is idempotent. - Existing 7 RxPartitionKeyRangeCacheTest cases unchanged and passing. Reference Pattern matches Python (sdk/cosmos/azure-cosmos/azure/cosmos/_routing/ routing_map_provider.py) which uses module-level endpoint-keyed dicts with refcounted cleanup. Adapted to Java idioms (ConcurrentHashMap.compute instead of explicit RLock, Closeable instead of __del__). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR reduces duplicated routing-map cache memory and redundant /pkranges requests by sharing the storage layer of RxPartitionKeyRangeCache across CosmosClient / CosmosAsyncClient instances that target the same Cosmos account (keyed by service endpoint), while keeping the per-client fetch path unchanged. The shared cache is managed by a process-wide, refcounted registry and can be disabled via a new system property for opt-out.
Changes:
- Introduces
SharedRoutingMapCacheRegistry(endpoint-keyed, refcounted) to shareAsyncCacheNonBlocking<String, CollectionRoutingMap>across clients. - Updates
RxPartitionKeyRangeCacheto acquire shared storage by endpoint and to implementCloseablefor refcount release on client shutdown. - Wires
RxDocumentClientImpl.close()to release the cache reference, adds config flag plumbing, and adds targeted unit tests + changelog entry.
Show a summary per file
| File | Description |
|---|---|
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/RxDocumentClientImpl.java | Passes endpoint into the cache ctor and releases the cache reference during client close. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/Configs.java | Adds COSMOS.SHARED_PARTITION_KEY_RANGE_CACHE_ENABLED flag (default true). |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/caches/SharedRoutingMapCacheRegistry.java | New process-wide singleton registry for shared routing-map cache storage with refcounted eviction. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/caches/RxPartitionKeyRangeCache.java | Splits “storage” vs “fetcher” by sourcing storage from the shared registry and adding close() ref-release. |
| sdk/cosmos/azure-cosmos/CHANGELOG.md | Documents the new sharing behavior and opt-out property. |
| sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/caches/SharedRoutingMapCacheRegistryTest.java | New unit tests validating sharing, eviction, disabled behavior, and concurrency refcount correctness. |
| sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/caches/RxPartitionKeyRangeCacheTest.java | Adds tests validating cross-client sharing, cross-endpoint isolation, and idempotent close behavior. |
Copilot's findings
- Files reviewed: 7/7 changed files
- Comments generated: 1
|
@sdkReviewAgent |
…e host matching Switch SharedRoutingMapCacheRegistry's key type from String to URI so URI.equals() — which is case-insensitive on the host component per RFC 3986 — is used for sharing identity. Previously, two clients built with 'https://Acct.documents.azure.com/' and 'https://acct.documents.azure.com/' would fragment into two registry entries even though they target the same account. With URI as the key the two collapse into a single shared entry. This matches the spirit of the Rust SDK, which uses Url-based equality on its AccountReference identity. Python uses raw string comparison; Java's URI gives us strictly better behaviour for free. Added a new test (acquireTreatsHostCaseInsensitivelyMatchingUriEquals) that asserts URI.equals() considers the two casings equal AND that the registry produces a single shared entry for them. Ran 34 cache unit tests, 0 failures. No public API change. RxPartitionKeyRangeCache's three-arg ctor still takes URI; only the internal field type changed (String -> URI). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…cross-SDK consistency Confirmed via cross-SDK review that both peer Cosmos SDKs key sharing on the user-supplied account endpoint URL, not on the account _rid: - Python (sdk/cosmos/azure-cosmos/azure/cosmos/_routing/_routing_map_provider_common.py): _resolve_endpoint() returns client.url_connection (the input endpoint string) with no normalisation and no _rid lookup. - Rust (sdk/cosmos/azure_data_cosmos_driver/src/models/account_reference.rs): AccountReference identity is endpoint-only via AccountEndpoint(Url) which Hash/Eq on the Url; PartialEq deliberately excludes credentials and backup endpoints. No _rid involvement. This SDK should match. The "regional vs global endpoint to the same account" case stays a known fragmentation case across all three SDKs rather than something Java solves alone via _rid. Why _rid keying was rejected after exploration: 1. Diverges from Python and Rust — increases mental-model and maintenance cost for cross-SDK contributors. 2. DatabaseAccount.getResourceId() returns the empty string in emulator and some service paths where the account JSON has no _rid (Resource.java:130 delegates to JsonSerializable.getString(R_ID)). Would silently fall back and fragment differently than peers. 3. Brittle to init reorders: today GlobalEndpointManager.init() runs before cache construction, but any future refactor (lazy account fetch, offline-mode init) would silently break sharing. Endpoint URI is constructor-immutable; _rid depends on a successful prior network call. Final shape: - Registry keyed by URI (case-insensitive host via URI.equals). - RxPartitionKeyRangeCache 3-arg ctor takes (client, collectionCache, serviceEndpoint URI). Two-arg ctor delegates with client.getServiceEndpoint(). - JavaDoc on SharedRoutingMapCacheRegistry now explicitly documents the cross-SDK alignment and the regional-endpoint fragmentation tradeoff. All 34 cache unit tests still pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
✅ Review complete (35:07) Posted 7 inline comment(s). Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage |
…clients
Without this safety net, a customer that forgets to call CosmosClient.close()
would pin the shared partition-key-range cache entry for the lifetime of the
JVM. The owning RxPartitionKeyRangeCache holds a strong reference to the
shared AsyncCacheNonBlocking and the registry's refcount stays > 0 forever.
Peer SDKs handle this:
- Python: __del__ in PartitionKeyRangeCache calls release() as a GC fallback
(sdk/cosmos/azure-cosmos/azure/cosmos/_routing/routing_map_provider.py L192).
- Rust: no Drop impl needed — the cache lives as a field on the driver and
Rust ownership guarantees cleanup on driver drop.
Java cannot use java.lang.ref.Cleaner because azure-cosmos targets Java 8
(verified: sdk/parents/azure-client-sdk-parent/pom.xml <source>1.8</source>).
Solution uses the pre-Cleaner pattern: PhantomReference + ReferenceQueue +
daemon reaper thread. All Java 1.2+ APIs.
Design
- SharedRoutingMapCacheRegistry holds:
* ReferenceQueue<Object> reaperQueue
* Set<OwnerPhantom> livePhantoms (concurrent) — critical for correctness:
the JVM only enqueues phantoms that are themselves still strongly
reachable, so the registry must hold them alive until processed.
* One daemon thread (cosmos-shared-pkr-cache-reaper) blocking on
reaperQueue.remove().
- acquire(URI endpoint, Object owner): registers an OwnerPhantom on the
owner, adds it to livePhantoms, returns AcquireResult { cache, phantom }.
- release(URI, cache, PhantomReference) — new 3-arg overload — clears the
phantom and removes it from livePhantoms in addition to decrementing the
refcount. This is the path RxPartitionKeyRangeCache.close() uses.
- When the owner becomes phantom-reachable, the reaper drains the queue,
logs a WARN ("Leaked (unclosed) RxPartitionKeyRangeCache detected..."),
calls release(endpoint, cache) to decrement refcount, then removes the
phantom from livePhantoms.
- close() is still the right primary path; the reaper is a safety net that
prevents permanent JVM-lifetime cache pinning, not a substitute.
Tests
- reaperReleasesSharedCacheWhenOwnerIsGarbageCollected: acquires in a helper
method (so the test frame cannot keep owner alive), polls referenceCount
while forcing System.gc() in a 15s window. Reaper warning is observable
in test output.
- promptCloseClearsPhantomSoReaperDoesNotDoubleRelease: validates the
prompt-close path clears the phantom and a subsequent GC produces no
extra release.
36 cache unit tests pass (was 34, +2 new leak tests).
Key correctness note in code
The first attempt at this had a subtle bug: acquire() returned the phantom
in AcquireResult but the registry didn't hold it. Once the test discarded
the AcquireResult, the phantom became unreachable and the JVM never enqueued
it — the reaper sat idle forever. The livePhantoms set fixes this. The
fields/JavaDoc explicitly document the why.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… net Replace the bespoke PhantomReference + ReferenceQueue + daemon-thread reaper with com.azure.core.util.ReferenceManager.INSTANCE, the SDK-wide singleton that already encapsulates this pattern. ReferenceManagerImpl: - On Java 9+ delegates reflectively to java.lang.ref.Cleaner. - On Java 8 (our baseline) uses an internal PhantomReference + daemon thread named "azure-sdk-referencemanager" — exactly the same mechanism this PR was reimplementing. Confirmed in test output: the leak WARN is logged on the "azure-sdk-referencemanager" thread, proving the azure-core path is wired. Why this is better: - Reuses supported, well-tested azure-core machinery instead of rolling our own. One thread per JVM regardless of how many SDK components opt into the pattern, instead of cosmos adding its own competing thread. - Java 9+ automatically gets the Cleaner-based implementation (better shutdown semantics, less thread-stack overhead). - Drops ~100 lines of bespoke phantom plumbing from SharedRoutingMapCacheRegistry (OwnerPhantom inner class, livePhantoms set, reaper loop). Net negative on code we maintain. Design notes preserved: - The lambda registered with ReferenceManager.INSTANCE.register MUST NOT capture `owner`, otherwise the owner never becomes phantom-reachable. We capture only the endpoint URI and the cache reference (both independent of the owner) and document this constraint in code. - ReleaseHandle is a one-shot AtomicBoolean fulfilment flag shared between the prompt close() path and the deferred ReferenceManager cleanup, so whichever runs first wins via compareAndSet and the refcount is decremented exactly once. 36 cache unit tests still pass; the leak test was renamed to referenceManagerReleasesSharedCacheWhenOwnerIsGarbageCollected to reflect the new mechanism. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1a27dc2 to
9b43616
Compare
Per PR feedback, comments in the shared-cache implementation were too verbose and contained cross-SDK comparisons that don't add value to maintainers reading the Java code. Trimmed everywhere: - SharedRoutingMapCacheRegistry: removed Python/Rust comparison paragraphs, the "Cross-SDK consistency" and "Leaked-client safety net" walls of text, and condensed JavaDoc on individual methods. Kept only the critical "lambda must not capture owner" comment because it's a correctness invariant that's easy to break in a refactor. - RxPartitionKeyRangeCache: removed the long ownerPhantom-style field comments; consolidated the class JavaDoc into two sentences. - Configs: condensed the system-property comment to two lines. - RxDocumentClientImpl: shortened the close-path log message. - CHANGELOG entry: condensed to a single sentence describing the change and the opt-out flag. - Tests: stripped the "First client / Second client" narration, the "must hit the shared cache" explanations, and the multi-paragraph preambles on the leak tests. Kept enough to explain the GC-related test setup since that's not obvious from the code. Behavior unchanged; 36 cache unit tests still pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Renamed SharedRoutingMapCacheRegistry → SharedPartitionKeyRangeCacheRegistry for consistency with the class it serves (RxPartitionKeyRangeCache). - Removed the test-only acquire(URI) overload that bypassed ReferenceManager registration; tests now use acquire(URI, owner) so the cleanup-action path is exercised end-to-end. - Added clientWithServiceEndpointAcquiresAndReleasesRegistryRefcount: regression test guarding the RxDocumentClientImpl.close() → partitionKeyRangeCache.close() → refcount-- wiring. Constructs the cache via the 2-arg ctor (matching production) and asserts the refcount delta on construct and close. - Added forceRefreshOnSharedCacheIsVisibleToSiblingClient: cross-client invalidation propagation. Client A populates → A force-refreshes after a simulated split → B's lookup sees A's refreshed value (same routing-map instance) without issuing its own /pkranges call. Asserts object identity on the shared CollectionRoutingMap. 38 cache unit tests pass (was 36). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@sdkReviewAgent |
Previous run failed in azure-cosmos-spark_3-3_2-12 with a scala-maven-plugin classpath flake (xsbt/ZincCompiler$sbtAnalyzer$ ClassNotFoundException) unrelated to this PR's changes (PR touches azure-cosmos core; Spark connector is unaffected). Empty commit to re-run the pipeline. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Status + a request for the exact failing test name. After two What I've fixed so far (both genuine sharing-incompatibilities, worth keeping):
What I've ruled out as the remaining culprit: Why I can't pinpoint the rest: the Ask: could you paste the failing test class+method from one |
|
Local validation update on the
Takeaway: the two committed fixes cover the genuine sharing-incompatible Could you confirm two things from the internal
With the name I can finish it immediately — make it sharing-aware if it's genuinely a sharing issue, or flag it as pre-existing if it fails independent of this PR. |
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Fresh re-run confirms Combining that with the local reproduction, here's where I've landed:
Conclusion: the remaining Could you confirm from the internal failsafe report whether the failing |
|
Definitive CI root-cause: every remaining Key reframing: the live legs run the PR merged with upstream
Remaining: Positive validation that cache-sharing is correct: all the multi-master/multi-region direct legs that most stress the shared cache are green — Net: all PR-caused failures are fixed and validated; every remaining red is flaky, pre-existing direct-mode, or a new upstream test. No further code change is warranted. If the internal report shows the |
…e responses without an activity id The _Tcp_Fast (direct-mode) live legs deterministically failed in CosmosTracerTest.cosmosAsyncContainer with 'Argument activityId must not be null'. Root cause is a pre-existing bug in addRequestInfoForStoreResponses: it seeds activityId from the (always non-null) request-level ClientSideRequestStatistics activity id, then unconditionally overwrites it with the store response's server-side activity id, which is null for certain responses (e.g. transient transport errors in Direct mode). The null then trips the non-null contract of CosmosDiagnosticsRequestInfo in getRequestInfo(). Fall back to the request-level activity id when the store response does not carry one, mirroring the existing null-fallback used for gateway statistics in ClientSideRequestStatistics. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Root-caused and fixed the deterministic Pulling the real failing test from the aggregate Root cause (pre-existing, surfaced by the Direct-mode legs): Fix (abb16811c42): fall back to the request-level activity id when the store response doesn't carry one, mirroring the null-fallback already used for gateway statistics in Validation: Re-running live tests to confirm /azp run java - cosmos - tests |
|
/azp run java - cosmos - tests |
|
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command. |
|
/azp run java - cosmos - tests |
|
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command. |
|
/azp run java - cosmos - tests |
|
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command. |
…ition-key-range-cache # Conflicts: # sdk/cosmos/azure-cosmos/CHANGELOG.md
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Address PR review feedback: only record the PARTITION_KEY_RANGE_LOOK_UP metadata diagnostic when a real /pkranges network fetch happens, instead of on every routing-map cache lookup. - RxPartitionKeyRangeCache: move the diagnostic emission back into the network-fetch path (getRoutingMapForCollectionAsync); drop the per-lookup recording from tryLookupAsync / tryGetRangeByPartitionKeyRangeId. The shared-registry/Closeable lifecycle changes are kept. - CosmosDiagnosticsContext: revert the store-response activityId null-guard (restores the original unconditional fallback); drop the stale CHANGELOG bug-fix entry that described it. - FaultInjectionMetadataRequestRuleTests: revert to the original single-entry PARTITION_KEY_RANGE_LOOK_UP assertion (the test forces a routing-map refresh, so a network fetch is guaranteed). - CosmosDiagnosticsTest.validateDirectModeDiagnosticsOnSuccess: stop asserting PARTITION_KEY_RANGE_LOOK_UP is present. With the shared cache a sibling client on the same endpoint may have already populated the routing map, so this client can serve the lookup from cache without a /pkranges fetch (and thus records no diagnostic). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run java - cosmos - spark |
|
/azp run java - cosmos - kafka |
|
Azure Pipelines successfully started running 1 pipeline(s). |
1 similar comment
|
Azure Pipelines successfully started running 1 pipeline(s). |
Description
Today every
CosmosClient/CosmosAsyncClientowns its ownRxPartitionKeyRangeCache, even when many clients in the same JVM are configured with the same service endpoint (a common pattern for multi-tenant / multi-credential apps and frameworks that recreate clients). The routing-map data is duplicated N times and/pkrangescalls fan out N times for the same containers.This PR moves the routing-map storage to a process-wide, refcounted registry keyed by the service endpoint
URIconfigured onCosmosClientBuilder. The fetching path (which depends on the per-client network stack, auth, collection cache, diagnostics) stays per-client.Design
Split
RxPartitionKeyRangeCacheinto two layers:AsyncCacheNonBlocking<String, CollectionRoutingMap>. Account-level data, naturally shareable. Now obtained fromSharedPartitionKeyRangeCacheRegistry(process-wide singleton) keyed by the service endpointURI./pkranges, depends on per-clientRxDocumentClientImpl,RxCollectionCache, diagnostics. Unchanged.Scope of sharing
Two clients share the cache only when their service endpoint URIs compare equal via
URI.equals(case-insensitive on host per RFC 3986). Clients configured with different endpoint URIs — including the global endpoint vs a regional endpoint of the same logical account — do not share.The natural-looking alternative of keying by
DatabaseAccount.getId()(so global + regional clients of the same account would share) was tried and rejected: the id returned from a regional endpoint is<globalId>-<service-normalised-region>, and recovering the global form requires brittle suffix-stripping against the readable/writable locations list.DatabaseAccount.getResourceId()(the_ridfield) is not a documented canonical id at the protocol level. Rather than ship a fragile canonicalisation, the registry honestly keys on the builder-supplied URI.Concurrency model
All registry state transitions go through
ConcurrentHashMap.compute(...), which provides atomic per-key check-and-update.Lifecycle
RxPartitionKeyRangeCachector acquires from the registry (bumps refcount).RxPartitionKeyRangeCacheimplementsCloseable;close()releases the refcount and is idempotent (guarded byAtomicBoolean).RxDocumentClientImpl.close()callsLifeCycleUtils.closeQuietly(partitionKeyRangeCache).com.azure.core.util.ReferenceManager: if a client is GC'd without callingclose(), the cleanup decrements the refcount once. AWARNlog identifies the leaking endpoint.Diagnostics
Diagnostics behaviour is unchanged: the
PARTITION_KEY_RANGE_LOOK_UPmetadata diagnostic is recorded only when a real/pkrangesnetwork fetch happens (insidegetRoutingMapForCollectionAsync), exactly as before this change. A consequence of sharing the routing-map storage is that a client can serve a PK-range lookup from a cache already populated by a sibling client on the same endpoint without issuing any/pkrangesfetch — in which case noPARTITION_KEY_RANGE_LOOK_UPdiagnostic is recorded for that operation. Tests that previously assumed the diagnostic is always present were updated accordingly:CosmosDiagnosticsTest.validateDirectModeDiagnosticsOnSuccessno longer asserts its presence, whileFaultInjectionMetadataRequestRuleTestskeeps its original single-entry assertion because it forces a routing-map refresh (so a network fetch — and the delayed diagnostic — is guaranteed).Opt-out
System property
COSMOS.SHARED_PARTITION_KEY_RANGE_CACHE_ENABLED=falserestores per-client private caches.Files
caches/SharedPartitionKeyRangeCacheRegistry.javaURIcaches/RxPartitionKeyRangeCache.java(client, collectionCache, URI); registry-backed storage; idempotentclose()(diagnostics emission unchanged — still recorded only on the/pkrangesnetwork fetch path)Configs.javaCOSMOS.SHARED_PARTITION_KEY_RANGE_CACHE_ENABLED(default: enabled)RxDocumentClientImpl.javathis.serviceEndpointto the cache ctor; release the cache inclose()caches/SharedPartitionKeyRangeCacheRegistryTest.javacaches/RxPartitionKeyRangeCacheTest.javaSharedPartitionKeyRangeCacheE2ETest.javaCosmosClientBuildernormalises the endpoint URI so two distinct connectable endpoints can't be built in a single-endpoint test environment)CHANGELOG.mdTest plan
mvn install(azure-cosmos)mvn checkstyle:check spotbugs:check(azure-cosmos + azure-cosmos-tests)RxPartitionKeyRangeCacheTest+SharedPartitionKeyRangeCacheRegistryTest)SharedPartitionKeyRangeCacheE2ETest) registered under theemulatorandfastMaven profiles — executed in CI against the configured Cosmos endpoint.Key behavioural tests (unit)
twoCachesForSameEndpointShareRoutingMapStorage— client A populates the routing map, client B serves the same lookup withclientB.readPartitionKeyRangesinvoked zero times.cachesForDifferentEndpointsDoNotShareStorage— clients with different endpoint URIs each invoke their ownreadPartitionKeyRangesexactly once.forceRefreshOnSharedCacheIsVisibleToSiblingClient— client A's force-refresh propagates to client B without B issuing its own fetch.closeIsIdempotent— repeatedclose()calls do not drive refcount negative.clientWithServiceEndpointAcquiresAndReleasesRegistryRefcount— regression guard for theRxDocumentClientImpl.close()→partitionKeyRangeCache.close()wiring.concurrentAcquireAndReleaseProducesConsistentRefcount— 32 threads × 200 ops, refcount ends at 0.referenceManagerReleasesSharedCacheWhenOwnerIsGarbageCollected— leak-safety net: an unclosed client is reclaimed byReferenceManageronce GC'd.acquireTreatsHostCaseInsensitivelyMatchingUriEquals— RFC 3986 host case-insensitivity flows through to the registry key.regionalAndGlobalEndpointsDoNotShareStorage— pins the explicit scope: distinct endpoint URIs use distinct registry entries.disabledFlagReturnsIsolatedCachesAndPreservesRegistryEmpty— opt-out preserves pre-sharing behaviour.Key behavioural tests (e2e, real Cosmos endpoint)
twoClientsOnSameEndpointShareRoutingMapStorage— spins up two realCosmosAsyncClients configured with the same endpoint, performs PK-routed reads on both, and asserts they share the sameAsyncCacheNonBlockinginstance, the registry refcount accounts for both holders, and closing each client decrements the refcount by exactly one.SharedPartitionKeyRangeCacheRegistryTest.acquireReturnsDifferentInstanceForDifferentEndpoints/regionalAndGlobalEndpointsDoNotShareStorageandRxPartitionKeyRangeCacheTest.cachesForDifferentEndpointsDoNotShareStorage— rather than e2e, becauseCosmosClientBuilder.validateConfig()strips path/query so two distinct connectable endpoint URIs can't be constructed against a single test endpoint.Breaking changes
None.
RxPartitionKeyRangeCacheis in theimplementationpackage; its ctor signature and its newCloseablesupertype are not part of the public API surface. No customer-visible APIs change.